Visualizing SMPS Data using the py-smps library

The Scanning Mobility Particle Sizer (SMPS) is a high resolution particle sizer that is commonly used in research for characterizing the size distribution of aerosols.

This py-smps python library is a simple way to read in the data, analyze it, and visualize it. A loader (smps.io.load_file) can be used to import the data from the SMPS, and two plotting functions are available (smps.plots.heatmap, smps.plots.histplot).

Below is a quick tutorial to show how to import the data, look at it, and plot it. Any bugs with the software can be reported on github.

Requirements

I personally recommend using python3 and heavily leaning on seaborn for visualization help. There are three required packages for this library:

Data

To make the process seamless, I recommend exporting your data from the SMPS with the settings in column format with a ',' delimiter. For units, using dN/dlogDp is preferred, as it is the natural format for aerosol distributions. I have made available an ambient dataset which is available here.

Visualization

The beautification of plots is aided by using seaborn. For more information, check out their documentation! It's great.

Import the Library



In [1]:

    
import smps
import seaborn as sns
import os
import matplotlib
import matplotlib.pyplot as plt
import json

%matplotlib inline

# You can use seaborn to easily control how your plots appear
sns.set('notebook', style='ticks', font_scale=1.5, palette='dark')

smps.set()

print ("smps v{}".format(smps.__version__))
print ("seaborn v{}".format(sns.__version__))
print ("matplotlib v{}".format(matplotlib.__version__))









    



smps v1.0.0
seaborn v0.9.0
matplotlib v3.0.0

Load the Data into an SMPS object

The SMPS loader (smps.io.load_file) returns an SMPS object which has several attributes including:

SMPS.raw
SMPS.df
SMPS.meta
SMPS.bins
SMPS.midpoints
SMPS.bin_labels
SMPS.histogram

`smps.io.load_file(fpath, column=True, **kwargs)`

Arguments

fpath: File Path for the data
column: If your data is in 'column' format, set True. Otherwise, set False



In [2]:

    
bos = smps.io.load_sample("boston")

Explore the SMPS Object

Let's take a look at the SMPS object that was returned by the loader.

`SMPS.meta`

The SMPS.meta attribute contains the meta information that was held in the SMPS text file. It is returned as a python dictionary.



In [3]:

    
print (json.dumps(bos.meta, indent=4))









    



{
    "Sample File": "C:\\Users\\Marduk\\Documents\\SMPS_data\\r20161122_SMPS.6.7.S80",
    "Classifier Model": "3080",
    "DMA Model": "3081",
    "DMA Inner Radius(cm)": "0.00937",
    "DMA Outer Radius(cm)": "0.01961",
    "DMA Characteristic Length(cm)": "0.44369",
    "CPC Model": "3775 Low Flow",
    "Gas Viscosity (kg/(m*s))": "1.822e-005",
    "Mean Free Path (m)": "6.642e-008",
    "Channels/Decade": "64",
    "Multiple Charge Correction": "FALSE",
    "Nanoparticle Aggregate Mobility Analysis": "FALSE",
    "Diffusion Correction": "FALSE",
    "Units": "dw/dlogDp",
    "Weight": "Number",
    "Lower Size (nm)": 21.2875,
    "Upper Size (nm)": 1000.0,
    "weight": "number",
    "units": "dw/dlogdp"
}

`SMPS.bins` and `SMPS.midpoints`

SMPS.bins is an nx3 array that contains the left, middle, and right side of each bin in the dataset. SMPS.midpoints is simply the center column of bins. NOTE: All diameters are expected to be in nm. This can be changed by altering the dp_units argument. All diameters are then promptly converted to microns.



In [4]:

    
# print out the first 4 bins
bos.bins[0:4]









    Out[4]:





array([[0.0212875, 0.0217   , 0.0220673],
       [0.0220673, 0.0225   , 0.0228757],
       [0.0228757, 0.0233   , 0.0237137],
       [0.0237137, 0.0241   , 0.0245824]])



In [5]:

    
# print out the midpoints
bos.midpoints









    Out[5]:





array([0.0217, 0.0225, 0.0233, 0.0241, 0.025 , 0.0259, 0.0269, 0.0279,
       0.0289, 0.03  , 0.0311, 0.0322, 0.0334, 0.0346, 0.0359, 0.0372,
       0.0385, 0.04  , 0.0414, 0.0429, 0.0445, 0.0461, 0.0478, 0.0496,
       0.0514, 0.0533, 0.0552, 0.0573, 0.0594, 0.0615, 0.0638, 0.0661,
       0.0685, 0.071 , 0.0737, 0.0764, 0.0791, 0.082 , 0.0851, 0.0882,
       0.0914, 0.0947, 0.0982, 0.1018, 0.1055, 0.1094, 0.1134, 0.1176,
       0.1219, 0.1263, 0.131 , 0.1358, 0.1407, 0.1459, 0.1512, 0.1568,
       0.1625, 0.1685, 0.1747, 0.1811, 0.1877, 0.1946, 0.2017, 0.2091,
       0.2167, 0.2247, 0.2329, 0.2414, 0.2503, 0.2595, 0.269 , 0.2788,
       0.289 , 0.2996, 0.3106, 0.322 , 0.3338, 0.346 , 0.3587, 0.3718,
       0.3854, 0.3995, 0.4142, 0.4294, 0.4451, 0.4614, 0.4783, 0.4958,
       0.514 , 0.5328, 0.5523, 0.5725, 0.5935, 0.6153, 0.6378, 0.6612,
       0.6854, 0.7105, 0.7365, 0.7635, 0.7915, 0.8205, 0.8505, 0.8817,
       0.914 , 0.9475, 0.9822])

`SMPS.histogram` and `SMPS.raw`

SMPS.histogram contains the histogram as a pandas DataFrame. The index is a timeseries and can easily be manipulated. SMPS.raw contains both the histogram and all aditional information that the SMPS records including means, modes, etc. It also is a pandas DataFrame.



In [6]:

    
# Display the first few rows of the DataFrame
bos.data.head(3)









    Out[6]:







  
    
      
      Sample #
      bin0
      bin1
      bin2
      bin3
      bin4
      bin5
      bin6
      bin7
      bin8
      ...
      Status Flag
      td(s)
      tf(s)
      D50(nm)
      Median(nm)
      Mean(nm)
      Geo. Mean(nm)
      Mode(nm)
      Geo. Std. Dev.
      Total Conc.(#/cm³)
    
    
      timestamp
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2016-11-22 15:20:48
      1
      938.332
      1581.720
      1219.210
      1795.380
      1216.890
      1670.140
      908.874
      1653.720
      1204.76
      ...
      Normal Scan
      2.93
      12.4094
      1000
      40.5183
      66.2331
      50.1490
      24.1442
      1.97913
      697.18
    
    
      2016-11-22 15:23:20
      2
      374.100
      234.678
      254.937
      422.669
      372.819
      541.616
      657.469
      897.744
      1084.25
      ...
      Normal Scan
      2.93
      12.4094
      1000
      61.8646
      70.9645
      64.1694
      61.5265
      1.52780
      5865.59
    
    
      2016-11-22 15:25:50
      3
      5552.500
      3805.570
      3505.620
      4527.480
      4050.570
      3430.330
      3077.630
      3346.820
      2465.44
      ...
      Normal Scan
      2.93
      12.4094
      1000
      41.4080
      61.8417
      48.9025
      21.6739
      1.90589
      1913.93
    
  

3 rows × 132 columns

`SMPS.stats`

SMPS.stats contains the statistics generated by the SMPS. You can weight by number, surface area, volume, or mass and the results include the total number of particles, total surface area, total volume, total mass, the arithmetic mean (AM), the geometric mean (GM), the mode, and the geometric standard deviation (GSD).

In addition, you can integrate or calculate the stats over just a small section of the distribution by leveraging the dmin and dmax arguments.



In [7]:

    
bos.stats(weight='number').head()









    Out[7]:







  
    
      
      number
      surface_area
      volume
      mass
      AM
      GM
      Mode
      GSD
    
    
      timestamp
      
      
      
      
      
      
      
      
    
  
  
    
      2016-11-22 15:20:48
      697.179400
      19.788759
      1.036779
      1.710686
      66.232849
      50.150192
      24.1
      1.980009
    
    
      2016-11-22 15:23:20
      5865.582059
      122.225646
      3.126004
      5.157906
      70.962424
      64.168127
      61.5
      1.527811
    
    
      2016-11-22 15:25:50
      1913.922994
      38.724580
      1.094102
      1.805268
      61.841832
      48.904347
      21.7
      1.906091
    
    
      2016-11-22 15:28:20
      1128.932490
      19.119169
      0.561025
      0.925691
      55.358279
      43.893424
      21.7
      1.877475
    
    
      2016-11-22 15:30:49
      1118.602001
      26.788436
      0.814267
      1.343541
      69.367619
      55.631108
      21.7
      1.916492



In [8]:

    
bos.scan_stats.head()









    Out[8]:







  
    
      
      Status Flag
      High Voltage
      Scan Up Time(s)
      Retrace Time(s)
      Median(nm)
      Mode(nm)
      CPC Inlet Flow(lpm)
      Total Conc.(#/cm³)
      Sample #
      Low Voltage
      ...
      Upper Size(nm)
      Impactor Type(cm)
      Aerosol Flow(lpm)
      Down Scan First
      Density(g/cc)
      Scans Per Sample
      D50(nm)
      td(s)
      tf(s)
      Lower Size(nm)
    
    
      timestamp
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2016-11-22 15:20:48
      Normal Scan
      9735
      120
      30
      40.5183
      24.1442
      0.3
      697.18
      1
      10.6283
      ...
      1000
      None
      0.3
      False
      1
      1
      1000
      2.93
      12.4094
      21.2875
    
    
      2016-11-22 15:23:20
      Normal Scan
      9735
      120
      30
      61.8646
      61.5265
      0.3
      5865.59
      2
      10.6283
      ...
      1000
      None
      0.3
      False
      1
      1
      1000
      2.93
      12.4094
      21.2875
    
    
      2016-11-22 15:25:50
      Normal Scan
      9735
      120
      30
      41.4080
      21.6739
      0.3
      1913.93
      3
      10.6283
      ...
      1000
      None
      0.3
      False
      1
      1
      1000
      2.93
      12.4094
      21.2875
    
    
      2016-11-22 15:28:20
      Normal Scan
      9735
      120
      30
      35.0935
      21.6739
      0.3
      1128.94
      4
      10.6283
      ...
      1000
      None
      0.3
      False
      1
      1
      1000
      2.93
      12.4094
      21.2875
    
    
      2016-11-22 15:30:49
      Normal Scan
      9735
      120
      30
      54.8718
      21.6739
      0.3
      1118.60
      5
      10.6283
      ...
      1000
      None
      0.3
      False
      1
      1
      1000
      2.93
      12.4094
      21.2875
    
  

5 rows × 25 columns

We can go ahead and resample the data by mean if we would like to! Under the hood, this method splits the raw dataframe into numeric and non-numeric columns before resampling by mean the numeric columns and the non-numerics by 'first'. If inplace=True, then it will save the resampled data and replace the current raw dataframe. Otherwise, it will return a copy of the object.



In [9]:

    
bos.resample("5min", inplace=True)

bos.data.head(3)









    Out[9]:







  
    
      
      Sample #
      bin0
      bin1
      bin2
      bin3
      bin4
      bin5
      bin6
      bin7
      bin8
      ...
      tf(s)
      D50(nm)
      Median(nm)
      Mean(nm)
      Geo. Mean(nm)
      Mode(nm)
      Geo. Std. Dev.
      Total Conc.(#/cm³)
      Impactor Type(cm)
      Status Flag
    
    
      timestamp
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2016-11-22 15:20:00
      1.5
      656.2160
      908.1990
      737.0735
      1109.0245
      794.8545
      1105.8780
      783.1715
      1275.732
      1144.5050
      ...
      12.4094
      1000.0
      51.19145
      68.59880
      57.15920
      42.83535
      1.753465
      3281.385
      None
      Normal Scan
    
    
      2016-11-22 15:25:00
      3.5
      5025.4750
      3542.6750
      3495.1700
      3674.2750
      3541.0800
      3384.4750
      3200.3750
      2985.765
      2390.5950
      ...
      12.4094
      1000.0
      38.25075
      58.59965
      46.39665
      21.67390
      1.891495
      1521.435
      None
      Normal Scan
    
    
      2016-11-22 15:30:00
      5.5
      1323.2865
      1281.7545
      1218.1965
      1145.7940
      1089.3235
      1052.0015
      846.4660
      846.964
      816.1545
      ...
      12.4094
      1000.0
      55.81775
      75.35480
      59.27410
      33.09090
      1.924750
      746.581
      None
      Normal Scan
    
  

3 rows × 132 columns

Visualization

Okay. All we really want to do is visualize our data, right? Two common plots are a heatmap-like plot (smps.plots.heatmap) and a particle size distribution (smps.plots.histplot).

Here, we show how to use both of them. Each one returns a matplotlib axis object which can easily be manipulated as you would any other matplotlib object. This makes it easy to alter how they look, add lables, etc.

`smps.plots.heatmap(X, Y, Z, ax=None, kind='log', cbar=True, cmap=default_cmap, fig_kws=None, cbar_kws=None, **kwargs)`

Okay, so all you really need to do to plot the heatmap is give it your X, Y, and Z data:

X: Time Axis
Y: Bin midpoints
Z: Data (usually in the format of $dN/dlogD_p$)

You may think the default colormap is not ideal (it probably isn't), so you can easily change it by feeding it any valid matplotlib colormap object. You can read more about those here or here.



In [10]:

    
X = bos.dndlogdp.index
Y = bos.midpoints
Z = bos.dndlogdp.T.values

ax = smps.plots.heatmap(X, Y, Z, cmap='viridis', fig_kws=dict(figsize=(14, 6)))

# make the x axis dates look presentable
import matplotlib.dates as dates

ax.xaxis.set_minor_locator(dates.HourLocator(byhour=[0, 6, 12, 18]))
ax.xaxis.set_major_formatter(dates.DateFormatter("%d\n%b\n%Y"))

# Go ahead and change things!
ax.set_title("Cambridge, MA Wintertime SMPS Data", y=1.02, fontsize=20);

`smps.plots.histplot(histogram, bins, ax=None, plot_kws=None, fig_kws=None, **kwargs)`

To plot a histogram, you need to provide two pieces of information:

histogram: Your histogram data! You can provide it as an array, or as a DataFrame (it will be averaged out)
bins: Bin midpoints

There are plenty of ways to customize these plots. You can provide additional keyword arguments for the matplotlib bar chart (plot_kws) or the figure itself (fig_kws). You can also plot on an existing axis by providing that argument.

Example 1

Let's make a basic histogram depicting the particle size distribution over the entire dataset.



In [11]:

    
ax = smps.plots.histplot(bos.dndlogdp, bos.bins, plot_kws={'linewidth': .01}, fig_kws=dict(figsize=(12,6)))

ax.set_title("Cambridge, MA Wintertime Size Distribution")
ax.set_ylabel("$dN/dlogD_p \; [cm^{-3}]$")

sns.despine()

Example 2

Let's plot two seperate days and make them slightly transparent. Let's also go ahead and get rid of the linewidth on the individual bars.



In [12]:

    
dates = ["2016-11-23", "2016-11-24", "2016-11-25"]

ax = None

for i, date in enumerate(dates):
    color = sns.color_palette()[i]
    plot_kws = dict(alpha=0.65, color=color, linewidth=0.)
    
    ax = smps.plots.histplot(bos.dndlogdp[date], bos.bins, ax=ax, plot_kws=plot_kws, fig_kws=dict(figsize=(12, 6)))
    
# Add us a legend!
ax.legend(dates, loc='best')

ax.set_ylabel("$dN/dlogD_p \; [cm^{-3}]$")

# Remove the spines
sns.despine()



In [ ]:

	Sample #	bin0	bin1	bin2	bin3	bin4	bin5	bin6	bin7	bin8	...	Status Flag	td(s)	tf(s)	D50(nm)	Median(nm)	Mean(nm)	Geo. Mean(nm)	Mode(nm)	Geo. Std. Dev.	Total Conc.(#/cm³)
timestamp
2016-11-22 15:20:48	1	938.332	1581.720	1219.210	1795.380	1216.890	1670.140	908.874	1653.720	1204.76	...	Normal Scan	2.93	12.4094	1000	40.5183	66.2331	50.1490	24.1442	1.97913	697.18
2016-11-22 15:23:20	2	374.100	234.678	254.937	422.669	372.819	541.616	657.469	897.744	1084.25	...	Normal Scan	2.93	12.4094	1000	61.8646	70.9645	64.1694	61.5265	1.52780	5865.59
2016-11-22 15:25:50	3	5552.500	3805.570	3505.620	4527.480	4050.570	3430.330	3077.630	3346.820	2465.44	...	Normal Scan	2.93	12.4094	1000	41.4080	61.8417	48.9025	21.6739	1.90589	1913.93

	number	surface_area	volume	mass	AM	GM	Mode	GSD
timestamp
2016-11-22 15:20:48	697.179400	19.788759	1.036779	1.710686	66.232849	50.150192	24.1	1.980009
2016-11-22 15:23:20	5865.582059	122.225646	3.126004	5.157906	70.962424	64.168127	61.5	1.527811
2016-11-22 15:25:50	1913.922994	38.724580	1.094102	1.805268	61.841832	48.904347	21.7	1.906091
2016-11-22 15:28:20	1128.932490	19.119169	0.561025	0.925691	55.358279	43.893424	21.7	1.877475
2016-11-22 15:30:49	1118.602001	26.788436	0.814267	1.343541	69.367619	55.631108	21.7	1.916492

	Sample #	bin0	bin1	bin2	bin3	bin4	bin5	bin6	bin7	bin8	...	tf(s)	D50(nm)	Median(nm)	Mean(nm)	Geo. Mean(nm)	Mode(nm)	Geo. Std. Dev.	Total Conc.(#/cm³)	Impactor Type(cm)	Status Flag
timestamp
2016-11-22 15:20:00	1.5	656.2160	908.1990	737.0735	1109.0245	794.8545	1105.8780	783.1715	1275.732	1144.5050	...	12.4094	1000.0	51.19145	68.59880	57.15920	42.83535	1.753465	3281.385	None	Normal Scan
2016-11-22 15:25:00	3.5	5025.4750	3542.6750	3495.1700	3674.2750	3541.0800	3384.4750	3200.3750	2985.765	2390.5950	...	12.4094	1000.0	38.25075	58.59965	46.39665	21.67390	1.891495	1521.435	None	Normal Scan
2016-11-22 15:30:00	5.5	1323.2865	1281.7545	1218.1965	1145.7940	1089.3235	1052.0015	846.4660	846.964	816.1545	...	12.4094	1000.0	55.81775	75.35480	59.27410	33.09090	1.924750	746.581	None	Normal Scan

Visualizing SMPS Data using the py-smps library

Requirements

Data

Visualization

Import the Library

Load the Data into an SMPS object

smps.io.load_file(fpath, column=True, **kwargs)

Arguments

Explore the SMPS Object

SMPS.meta

SMPS.bins and SMPS.midpoints

SMPS.histogram and SMPS.raw

SMPS.stats

Visualization

smps.plots.heatmap(X, Y, Z, ax=None, kind='log', cbar=True, cmap=default_cmap, fig_kws=None, cbar_kws=None, **kwargs)

smps.plots.histplot(histogram, bins, ax=None, plot_kws=None, fig_kws=None, **kwargs)

Example 1

Example 2

`smps.io.load_file(fpath, column=True, **kwargs)`

`SMPS.meta`

`SMPS.bins` and `SMPS.midpoints`

`SMPS.histogram` and `SMPS.raw`

`SMPS.stats`

`smps.plots.heatmap(X, Y, Z, ax=None, kind='log', cbar=True, cmap=default_cmap, fig_kws=None, cbar_kws=None, **kwargs)`

`smps.plots.histplot(histogram, bins, ax=None, plot_kws=None, fig_kws=None, **kwargs)`